Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources

Abstract Background The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches. Results In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments). Conclusions Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.

2-I have removed the url links through the text and placed them only on the references 3-I included the ORCID number in the tittle page.4-The abstract is now structured as Background, Results and Conclusions 5-I have re-order the last sections after the discussion section and placed the references at the very end 6-I removed the highlighted text 7-the uploaded document is in Word (.doc) format 8-I think all the references and style is according to the instructions to authors stated on the website.I have added a sentence to the Acknowledgments section, there is a project of crop wild relatives that will help paying for the open access fees of this manuscript and I have acknowledged them.I hope this is okay.

Introduction
Improving crops to face biotic and abiotic stresses, and to enhance their nutritional value is essential for ensuring global food security [1].Ongoing biodiversity loss or decline can have a detrimental effect on future food security, natural diversity provides resources to overcome challenges to food production such as environmental changes, pests and diseases or limited land availability [2].CWRs hold a wealth of genetic diversity which can be used to improve and help adapt traditional crops to succeed under environmental changes, making them of paramount importance for research and conservation [3,4].For example, CWRs have recently been used to transfer key traits to common crops in breeding programmes, such as tolerance and resilience to diseases and abiotic stresses such as salt or drought conditions [5] (and references therein).Previous studies have focused on major crops such as pulses, cereals and forages, and their respective CWRs [6], and relatively little work has been done on oil crops, vegetables or fruits, and minor crops, such as those found in the Brassicaceae family.To date, this family has more than 300 accepted genera and around 4,000 species [7,8], and possess a wide array of genetic diversity.It owes its economic importance to the widespread use of edible root crops, vegetables and oilseeds [9].
Due to the importance of the family, several phylogenetic studies have aimed to unravel the systematics within Brassicaceae, and several taxonomic circumscriptions have been proposed to divide the family in tribes and to recognise relationships between species [10][11][12][13][14].In a revision of the family, Al-Shehbaz et al., [15] divided Brassicaceae in 25 tribes but highlighted that further revisions might be required due to the large number of species (400) and genera ( 100) not yet sequenced.The tribe Brassiceae, formed by eight clades, is the most studied because it includes the Brassica complex or U's triangle [16,17], which is formed by six globally important species of the genus that share three core genomes, termed A, B, and C, that have evolved independently [18].These studies are useful to understand the phylogenetic relationships between wild and cultivated species, but also to estimate cross-compatibility between them to enable future breeding [19,20].However, further sampling efforts and sequence data are still required.

Classification of CWRs
Transfer of genes between CWRs and crops can be challenging because of reproductive barriers between each pair of species [21].Thus, knowing the level of cross-compatibility between CWRs and their respective crop(s) is key to transferring desirable traits using traditional breeding approaches.There are three different methods to classify CWRs, each aiming to identify the cross-compatibility of a CWR to its respective crop.The most important and accurate classification was proposed by Harlan and de Wet [22], which relies on actual crossing data between a crop and a wild species.They classified CWRs in gene pools (GP): GP1 corresponds with cross-compatible individuals of the same species as the crop, GP2 represents a successful cross-pollination between a CWR and a crop, and GP3 is generally not compatible or results in sterile hybrids.However, producing this type of data is challenging, requiring living samples and investment.Thus, other classifications have been proposed when these resources are lacking.The taxon group (TG) classification [23] aims to estimate evolutionary relatedness based on taxonomic and hierarchical relationships.Four taxon groups were proposed to classify CWRs in the same genus as the crop to be cross-compatible; however, taxonomic circumscriptions do not necessarily reflect phylogenetic relationships.
More recently, Viruel et al., [24] proposed the use of the phylogenetic distances to estimate cross-compatibility, where shorter phylogenetic distances between species equates to a greater possibility for them to be cross-compatible.This is a useful tool, especially when there is no information available for the GP classification.

Characterisation of Brassicaceae species
Brassicaceae possesses a wide variety of crops and cultivated species, some of the most important ones are within the Brassica genus.Cabbage, broccoli (B.oleracea L.), turnip (B.rapa L.), rapeseed (B.napus L.) and mustard (B.juncea L. Czern.) are the main crops and most economically important within the family.There are breeding needs in agriculture that target different traits of these crops, such as resistance to biotic diseases, adaptation or tolerance to abiotic stresses and improving or enhancing agronomic and functional traits [25].
Efforts to characterise CWRs physio-and phenotypically have increased in recent years describing traits of interest.However, these often focus on plant growth, leaf characterisation and composition, dispersal syndrome and a few traits related to seeds such as germination, storage behaviour and mass (TRY database [26]).As a result, there is less information on their tolerance to biotic and abiotic stresses compared to other key traits.Cultivated Brassicaceae are affected by biotic and abiotic hazards that cause loss of yield and poor performance in the field (especially pests [27] and diseases [28,29]).CWRs of the Brassicaceae family are known to host desirable agronomic traits (compiled in [25,[30][31][32]), and abiotic stress tolerance such as to drought and salinity [33][34][35].Different successful crosses have been performed to transfer some of these traits between Brassica crops and wild species [36][37][38][39][40].The potential advantages of these crosses area not limited to agronomic traits and include health related applications such as, a CWR of broccoli, Brassica villosa Biv., has been used to increase anti-cancer compounds in a new variety [41].
Understanding the cross-compatibility between a CWR and a crop is essential for identifying the breeding techniques required to incorporate traits from wild species into the cultivar.
However, publications of successful sexual crosses between crops and CWRs of Brassicaceae family are very limited (reviews mainly on Brassica genus [39,42,43]), which could be due to the complexity of the process [31] and the lack of knowledge and characterisation of Brassicaceae CWRs.Therefore, it is critical to characterise and understand the crosscompatibility between CWRs and crops of Brassicaceae, as well as to preserve potential CWRs to facilitate their accessibility and conservation for future sustainable use.

Conservation and accessibility
Many CWRs are threatened with extinction by a range of factors such as land use and environmental changes, overexploitation, or invasive species [44].Kell et al., [45] urged for conserving at least 78% of the known CWRs in Europe and suggested increasing the use of data on population distribution, trends and size, as well as threat status, to design effective conservation plans.This would require information on the global distribution of wild genetic resources and their current ex situ representation in genebanks.Previous studies have identified gaps for specific areas (e.g., Indonesia [46], USA [47], Middle East [48], Europe [49]), or for specific crops (e.g., Hordeum [50], Capsicum [51], Solanum [52]) and landraces [53].In the last decade, there has been an increase in ex situ conservation of CWRs [4,6,20,54].However, there are CWRs of major and minor crops yet to be conserved, especially from the Brassicaceae family which contains 70 priority CWRs for 17 crops [20].Although there are scientific publications on Brassica crops, varieties, landraces and wild relatives [30,43], there has been very little focus on the characterisation and conservation of minor crops, or on the evolutionary relationships between crops and wild Brassicaceae.
In this study we aim to review the current classification of CWRs in Brassicaceae, to identify new CWRs potentially cross-compatible with cultivated Brassicaceae estimated by phylogenetic distance; and to describe the current geographic distribution and ex situ conservation status of CWRs in Brassicaceae.

Gaps in genetic sequence data and phylogeny
We have obtained DNA sequence data for 30% of the species (348 spp.out of a total 1,242 spp.) for four DNA regions: rbcL (175 spp.), matK (162 spp.), ITS (241 spp.), and trnLF (277 spp).Phylogenetic trees were built independently for each four genetic markers (Supplementary material Fig. S1) and the phylogenetic trees with the highest resolution and bootstrap support were obtained using matK data (with 131 taxa from the total of 162) and ITS data (214 taxa from the total of 241).In both phylogenetic trees, the tribes and groups were clearly divided in concordance with those defined in previous studies.A comparison of both phylogenetic trees was built to identify similarities and disparities between them (containing 85 common species, Fig. 1).Although the phylogenetic tree for ITS marker had more DNA sequence data, the genera Armoracia, Barbarea, Crambe and Nasturtium were only present in the plastid marker matK (Fig. 1).On the other hand, there were more DNA sequences available from the genera Physaria and Isatis for the ITS marker.

Identification of potential cross-compatible CWRs
In this study we distinguished successful conventional crosses of CWRs and crops that will produce hybrids, from crosses that will require biotechnology techniques (e.g., in vitro culture, embryo or ovary rescue).A summary of previous GP and TG classifications for CWRs in Brassicaceae including 265 taxa (20% of the total 1,242) is listed in Supplementary material Table S1.However, there were some wild species assigned as GP2 where no evidence of successful conventional cross pollination has been published.For example, for B. elongata Ehrh. as a GP2 of turnip (B.rapa, Table 1 and Supplementary material Table S1) with no information of crosses between them.Thus, to apply phylogenetic distances (PD) as a proxy for cross-compatibility, we only use as a reference CWRs with successful sexual crosses with the crop.The PD thresholds were specific for each crop and ranged from 0 (the closest species in the phylogenetic tree) up to 0.19 (the furthest cross-compatible CWR, Table 1) to predict potential cross-compatible CWRs.For example, white mustard, Sinapis alba L., and the wild species Kremeriella cordylocarpus (Coss.& Durieu) Maire, are suggested to be crosscompatible (Table 1, PD = 0.0096), because pairwise phylogenetic distances were lower than other known CWR with reported successful crosses with the crop (Table 1, PD < 0.0125).
Using this method based on phylogenetic distances, we propose 103 new potential CWRs (Table .1, see Supplementary data for more details) to be cross-compatible with 18 cultivated species.
The potential cross-compatibility CWRs estimated using phylogenetic distances will need to be revised considering ploidy level variation, because Brassicaceae species have a large variation in chromosome numbers, from 2n=150 (Crambe gordjaginii Sprygin & Popov) to 2n=8 for some Physaria species (Fig. 1, for detailed information see Supplementary data), and crosses between the same ploidy levels are recommended when possible.For example, using phylogenetic distances, we estimated that Brassica gravinae Ten.(2n=20) is likely crosscompatible with turnip (B.rapa, 2n= 10 and 20 between others).We recommend using the same cytotype forms (2n=20) to attempt crossing them.However, only 40% of the wild Brassicaceae species on this study had ploidy level or chromosome number information available (474 taxa) of which 122 are in ITS and 93 in matK phylogenetic tree (55 in common, Fig. 1).

Major crops
All the major crops listed in Table 1 were present in both phylogenetic trees and had successful conventional crosses reported that can confirm the cross-compatibility between some species.
Since major crops had more information published, we identified potential CWRs based on the phylogenetic distances, including new genera and species not previously suggested that had shorter distances (Table 1) than the cross-compatible CWRs already identified (Supplementary material Table S1).

Minor crops
In general, there was very little information regarding successful conventional crosses between wild species and minor cultivated crop (perennial wall rocket, Diplotaxis tenuifolia (L.) DC., was the only exception).For some taxa there were not enough sequences (Barbarea genus) or no DNA sequence data (e.g., cultivated Crambe), in others the problem was the lack of information on the cross-compatibility as a reference on the phylogenetic tree (e.g., Eutrema japonicum (Miq.)Koidz., had many wild species with DNA sequence data, but lacked referenced species as confirmed cross-compatible).

Traits for breeding
We identified gaps in the characterisation of the wild Brassicaceae species included in this study by compiling information in different databases to describe the most and least explored and characterised species.USDA-GRIN Global database on CWR [55] and Harlan and de Wet CWR inventory [56] compile traits of CWR and hold information for 14 cultivated Brassicaceae taxa (Supplementary data) and 171 wild species related to them.Biotic traits are the most studied, followed by fertility traits, the combination of both represents 74% of the available data.The remaining 26% are abiotic and agronomic traits (Fig 2a).Additionally, TRY database [26] shows more than 7,000 entries for wild Brassicaceae and there is information on potential traits of 599 Brassicaceae species.The main traits captured focus on morphology and physiology (e.g., plant growth, flowering time, dispersal syndrome, Fig. 2a).The genus with the largest number of traits recorded and published is Lepidium (416), followed by Brassica (347, Fig. 2b).However, the top five species that were the most characterised, with more traits identified, are from the Brassica genus (five of the six species that form the U's triangle, Fig. 2b).

Geographical conservation gaps of Brassicaceae
The distribution of all 1,242 studied populations confirms the global presence of wild Brassicaceae species, some of them are widely cultivated (i.e., Brassica rapa, B. juncea, Raphanus raphanistrum subsp.sativus (L.) Domin) or introduced.However, introduced species were removed from the analysis to focus on the native distribution of wild Brassicaceae.
For the geographical distribution we used TDWG (Biodiversity Information Standards) level 3, and the three regions with the greatest number of native taxa are Turkey (160 species), Spain (147) and Morocco (135, Fig. 3a).There are 787 species yet to be conserved ex situ (i.e., no records available on global databases) and more than 200 that are underrepresented (less than 5 populations conserved ex situ, Supplementary data).The greatest number of taxa missing from ex situ collections occur in Turkey (46), Kazakhstan (32) and Colorado regions (26, Fig. 3b).
Conservation status has been evaluated for only 440 species, of which ca. 30% are considered threatened (119 threatened of which 110 are also endemic).The highest number of threatened taxa were found in the Canary Islands (14), peninsular Spain (10), Colorado (11) and Cape Verde regions (9, Fig. 3c, see Supplementary data for more details).Half of the taxa in the database represent single region endemics (667 species).The greatest number of endemic species are found in Turkey (49), peninsular Spain (39) and Colorado regions (36, Fig. 3d, see Supplementary data for more details).
The geographical distribution of the new 103 proposed as cross-compatible CWRs has been defined (Fig. 4) and we observed that almost 70% of these species are not well represented in ex situ conservation or not represented at all (36%, Supplementary data).Unfortunately, more than 70% of them have not being globally evaluated for their conservation status and the level of threat of their populations is unknown (Supplementary data).

Discussion
Identification of new potential CWRs in Brassicaceae from available DNA data Various molecular markers have been used to reconstruct phylogenetic trees to distinguish between species and identify clades in Brassicaceae, the most frequently used are rbcL, matK, ITS and trnL_F.In general, rbcL is considered a slow evolving gene, matK is intermediate and ITS and trnL_F are evolving relatively faster [57].The choice of markers used is based on the desirable outcome, for example a combination of two markers such as rbcL and matK has been suggested to build phylogenetic trees and identify species [58].In the present study, we compared one nuclear marker (ITS) and one plastid marker (matK, Fig. 1).A more comprehensive phylogeny was recently published using larger Brassicaceae dataset (one species per genus [11]).The cultivated Brassicaceae were well spread around the supertribes Camelinodae (I) and Brassicodae (II) of the phylogenetic tree [11,13], with all major crops present in Brassicodae (II) (Supplementary material Fig. S1).In our study, both phylogenetic trees (ITS and matK) were congruent regarding the major clusters or groups formed, however there were a few discrepancies.
- -The Erucastrum genus was scattered around the Brassiceae tribe in the phylogenetic tree matK (Fig. 1) with Erucastrum species placed in both the Rapa/Oleracea and in the Nigra clades, as reported in previous phylogenies [17,32].In general, only two species are well studied in this genus, E. abyssinicum (A.Rich.)O.E.Schulz and E. gallicum (Willd.)O.E.Schulz, (the latter is widely distributed in USA, Europe and some areas of Asia [60]), and both were present in the two phylogenetic trees, matK and ITS.The division in the matK marker could be due to the distribution of the species, however, more populations should be investigated to verify this.Additionally, some of them are edible [32] and may have been subjected to some type of selection, a potential further reason for their distribution across clades.
Chromosome numbers also play an important role to estimate cross-compatibility between species.The evolution of Brassicaceae species seems to be driven by whole genome duplication events and polyploidy [61].These polyploidy events are species and lineagespecific and can affect the relationships of the species in the phylogeny.Polyploidy is present especially in the Brassiceae tribe [62], for example, polysomaty or mixoploidy (having cells with different numbers of chromosomes in different tissues or cells) has been reported in Brassica and Raphanus genera [63].There is a large disparity in chromosome numbers within the family [61,62] (Supplementary data), making ploidy data critical for the identification of cross-compatible species in addition to the phylogenetic distances.For example, variation in ploidy levels could be the reason why Brassica oleracea, a progenitor of B. carinata A.Braun, has few successful crosses and very low rates of hybrids produced [42].
Harlan and de Wet's classification [22] of CWRs enables the identification of potentially compatible wild species.They defined secondary gene pool (GP2) species as those that will be able to transfer genes by conventional crosses, with some possible barriers or lower success rates.They also suggested that gene pools could be separated based on different ploidy levels, but to our knowledge this approach has not been carried out, which could be challenging when species exhibit multiple ploidy levels.The literature and databases contain a mixture of 'secondary' and 'tertiary' (not compatible or resulting in sterile hybrids) CWRs, where no crosses are found in the literature, or with very limited success of crossing (Table 1), or even where biotechnology techniques (e.g., embryo rescue, ovary rescue, somatic hybrids) were required to obtain hybrids.Due to the complexity of the Brassicaceae family and the mixed classifications, it is more difficult for pre-breeders to use some of these wild genetic resources, because in many cases previous classifications were not validated by crosses.Thus, a detailed review of CWRs lists corroborated with data from crosses is urgently needed to clarify our current knowledge of Brassicaceae CWRs.In this study we compiled information of successful crosses, chromosome numbers and phylogenetic distances (Fig. 1) to update the classification of CWRs in Brassicaceae and to identify new CWRs that are potentially cross-compatible with crops (Table 1).These newly identify CWRs, using phylogenetic distances, will require characterisation and evaluation for crossability with the crop.

Major crops
As expected, there were more data and publications for well-known crops (Brassica U's triangle, Eruca, Sinapis, Raphanus) and their CWRs than for minor crops, especially regarding breeding and agronomic traits and ploidy level.Despite the Brassica genus comprising most of the major crops, the phylogenetic relationships between species are still far from understood.
Brassiceae is a polyploid tribe [64] and this is a challenge for taxonomists and geneticists and more investigation is needed to resolve taxonomic issues, and fully understand the crosscompatibility between species.
There are some incongruences on the cross-compatibility in the literature within the Brassica U's triangle.For example, despite being in different clades (Nigra and Oleracea respectively [17], Fig. 1) and having longer phylogenetic distances, B. nigra and B. oleracea are classified as secondary CWRs to each other due to successful crosses between them, but only when the B. nigra was used as the female parent [42].In general, most of the crosses that were successful within this tribe had the cultivated species as the female donor [42].Another successful interspecies sexual hybridisation was published by Kumar et al. [37], between B. rapa and B. fruticulosa (B.rapa being the female parent) and this technique has been used by other researchers as a bridge to transfer resistance genes from B. fruticulosa to B. juncea [40].
B. rapa and B. juncea are classified as GP3, however successful crosses between them have been reported [65,66].On the other hand, B. oleracea is one of the species from which B. napus originated, but the crosses between them produced a very low number of hybrids [42].
Nonetheless, further research is required to improve the success of gene transfer for this species.
There are also successful intergeneric crosses involving the Brassica genus such as attempts to cross with Orychophragmus violaceous [68,69] even though the species is distant in the phylogenetic tree (Table 1).Diplotaxis tenuifolia and Erucastrum gallicum were also able to produce hybrids when crossing with Brassica species, although in some cases these were only successful when Brassica was the female parent [70,71].Intergeneric crosses were also possible with Raphanus species and Eruca vesicaria (L.) Cav., but with low success and in some cases biotechnology techniques were required to overcome cross-compatibility barriers [72][73][74].

Minor crops
Minor or less common cultivated species such as hedge mustard, cress cultivars or Abyssinian kale are less widely cultivated and thus, less information is available for them.Crambe hispanica subsp.abyssinica (Hochst.ex R.E.Fr.) Prina, was not included in the phylogenetic tree due to lack of genetic sequence information.Similarly, characterisation and information about interesting traits within wild species of these minor crops are lacking in the literature.In some cases, there is an issue of self-incompatibility or sterile plants (Armoracia [75]) which makes the breeding process more complex.Due to the limited information about their ploidy levels, it is complicated to identify potential candidates to be cross-compatible with cultivated species.This is the case in the genus Diplotaxis, which possess dysploidy (an organism that has an increased or decreased number of chromosomes, by one or more, than the original [76]).
However, these CWRs could hide a wide genetic diversity and future evaluation of their adaptation and traits would be useful.For example, the Barbarea genus is considered a great source of plant defence compounds within the family [77], and some species show resistance to several biotic stresses (mildew, nematodes and thrips [78]).In other genera, medicinal compounds have been reported (Isatis [79], Nasturtium [80,81] and Sisymbrium [82]), and the effect of different environmental conditions have been evaluated (Isatis [83], Nasturtium [84] and Rorippa [85]).
Probably due to the novelty of some of these crops, very little has been done to improve their characteristics, but also few traits have been characterised to understand the requirements (if any at this stage) to cultivate these species and therefore, further investigation is needed, especially to understand and improve their performance and adaptation.Using available data for Brassicaceae, we have identified around 103 new potentially cross-compatible CWRs (Table 1, see Supplementary data for more details) for 18 crops, although, in general more investigation is needed.More species will need to be sequenced and generate more data (e.g., physiological and phenotypic characterisation as well as acquiring knowledge of the ploidy level).This is key to understand the needs of the cultivated species and to identify CWRs with interesting traits.Confirming the cross-compatibility of the new potential CWRs with the same ploidy level is critical, in addition to generating more DNA sequencing data to complete the genetic characterisation of the family.

Cultivated Brassicaceae limitations
A detailed characterisation of plant species is fundamental to understand the limitations of cultivated species.Combining phenotypic and genotypic data will positively impact on improving and transferring traits to major and minor crops as reviewed by Katche et al. [39].
As for the compatibility data, phenotypic and genotypic characterisation is generally available for major crops and non-existent or rare for less well-known cultivated species (Crambe, Nasturtium, or Diplotaxis).The exception observed in Fig. 2b is for Lepidium, which was one of the top three genera with more species characterised for at least one trait, however this could be due to the large number of accepted species included in the genus (up to 262 spp.).
The most studied traits were those related to the morphology and phenology of the plant in addition to agronomic traits and biotic stress resistance [36,40,73].Despite a recent increase in the study of abiotic stresses (salt and drought tolerance in Brassica [86,87] and Diplotaxis [88]), and characterisation of plants for improving photorespiratory activities (reported in Diplotaxis tenuifolia, D. muralis (L.) DC., D. erucoides (L.) DC. and Moricandia arvensis (L.) DC., which are characterised as C3-C4 intermediate species [89,90]), there is still work to be done especially for minor crops and wild relatives to increase their use in sustainable agriculture.

Key areas for ex situ conservation
CWRs from the Brassica genus that are native to Europe and related to human food were included on a high priority list for threat assessment [45].Based on the results presented here, the Mediterranean basin and the Middle East are two hotspots for wild and endemic Brassicaceae species (Fig. 3).The areas are two of the centres of origin highlighted by Vavilov and both are hotspots for plant biodiversity [48,92].For example, Brassica oleracea is thought to have been domesticated in Eastern Mediterranean [93].On the other hand, some studies propose Central Asia as the origin of domestication for Brassica rapa, but do not dismiss the possibility of multiple origins of domestication [94].There are many species that are endemic to China and several regions in Asia that should also be considered for exploring key traits (Fig. 3b).
This study contributes to determining conservation gaps in the Brassicaceae family, identifying the need for further collection and conservation of wild species.We have compiled information available of 1,242 Brassicaceae species, emphasising the gaps in genetic sequence data (more than 700 spp.lack this information), conservation status (only 400 are conserved ex situ and less than 300 are assessed in the IUCN Red List) and trait characterisation to promote their use as cross-compatible CWRs.The Mediterranean region has been described as a potential hotspot of threaten and endemic Brassicaceae species that have yet to be conserved.Additionally, we also have revealed gaps in understanding and evaluating CWRs for this important family (more than 500 species not characterised).Using phylogenetic distances, we proposed 103 new potential CWRs, of which 72 are already conserved ex situ.Once conserved, these CWRs should be characterised physiologically and genetically, requiring the sequencing of more markers (nuclear and plastid) and ploidy studies.This will facilitate their use in future breeding programmes.

Genetic data and phylogenetic analysis
Cultivated Brassicaceae species were identified using Annex I on the International Treaty on plant genetic resources for food and agriculture [95], and scientific literature, gathering a total of 22 major and minor crops, from 15 genera.All wild species from the same genera as the cultivated Brassicaceae crops were added to the database.Similarly, using the accepted CWRs lists based on gene pool and taxon group classifications [20,55], a further 14 new genera were included, obtaining a final target list of 29 genera and 1,242 taxa.
We used available data of cross-compatibility between species and phylogenetic reconstructions to estimate phylogenetic distances between species and differences in ploidy levels to estimate the potential of each pair of species to be cross-compatible [24].We used wild species with successful conventional crosses reported in the literature [42,43,55] (represented with a 'Y' and 'Y*' in Table 1) to predict new potential cross-compatible CWR with their respective crop.We built a phylogenetic tree where pairwise phylogenetic distances between the tips were estimated using the patristic method with the adephylo package [96] (v.1.1.13).The phylogenetic trees were transformed to ultrametic and the distance of the branches were standardised to '1.0' from the root.A threshold was established within the range of phylogenetic distance from a crop taxon to a known cross-compatible wild species.
Alignments were edited to remove sequences with large gaps and samples with missing data (80% or higher) using AMAS [106].The phylogenetic tree was built using the maximum likelihood criterion as implemented in IQ-TREE [107] (v.2.0.6) using the substitution model selected in MFP (ModelFinder Plus), which was GTR+G+I.The phylogenetic trees represented in this study included only one sequence per species, which corresponded with the longest sequence available.We also discarded any sequence not clustering with the remaining sequences of the same species in a preliminary analysis.We used Aethionema thomasianum J.
Gay as outgroup.The bootstrap was set up with 1,000 replicates and an ultrametric tree calculated with phangorn package [108].To compare and show the two phylogenetic trees we used the cophylo function from the phytools package [109] (v.1.2.0), using ggplot2 [110], and magick [111] to collate the ploidy figures and the trees.

Trait characterisation
Agronomic and physiological traits were obtained from the literature and from several databases such as USDA GRIN global [55] (accessed on December 2022) and the Harlan and De Wet CWR inventory [56] (accessed on December 2022) for all the CWRs that had information available.Additionally, 50 seed and plant traits (Supplementary data) were gathered from TRY database [26].This database includes specific traits and plant characterisation that have been published or reported in other databases, research articles or unpublished data.

Distribution and conservation data
The distribution and accepted scientific names were downloaded and matched from World Checklist of Vascular Plants [60] (version 9, accessed on February 2022) for all taxa.The distribution of introduced species was not included to focus on the native distribution of wild species.For the geographical distribution we used the Biodiversity Information Standards (before known as Taxonomic Databases Working Group, TDWG) level 3. We used the IUCN Red List [91] (accessed on September 2022) and the ThreatSearch tool from Botanic Gardens Conservation International (BCGI [112], accessed on September 2022) to assess the global threat status of the Brassicaceae species.Similarly, global records of ex situ collections were gathered using Genesys [113] (Data accessed through Genesys on November 2022 via R package genesysr [114]) and the Millennium Seed Bank Partnership database [115] (accessed on September 2022).The conservation status for the 1,242 species were extracted using rredlist package [116].The analysis of the data for this section was performed in R [100] (v.4.2.1)unless otherwise specified, using the following R packages: To curate, visualise and analyse the data we used: cowplot [117](v.1.1.1),data.table[118] (v1.14.8), geojson [119] (v.0.3.5),sf [120] (v.1.0.14) and tidyverse [101](v2.0.0).

Table 1
Table 1 List of cultivated Brassicaceae and potential CWR (crop wild relatives) species based on phylogenetic distances (PD) between them for the two genes used in the phylogenetic tree, matK and ITS from Fig. 1. (Y) represents conventional crosses reported for classified CWRs (based on Gene Pool classification listed in Supplementary material Table S1), (Y*) represents conventional crosses with very low success, (N) are unsuccessful conventional crosses (or crosses that needed biotechnology)."NI" means no information was found for their crosses [37, 42, 94 and references within them].We represented with 'NA' the species that were not present in one or both trees.Authorships and ID for the scientific name of the species are listed in the Supplementary data file.
Bold taxa represent the new CWRs identified using the PD (to see the complete and detailed list please view Supplementary data [122]).

Figure 1 .
Figure 1.A comparison of two phylogenetic trees, nuclear marker ITS (a) and plastid marker matK (b), reconstructed using maximum likelihood approaches for the data available in the GenBank for Brassicaceae.When the bootstrap is greater or

Figure 2 .
Figure 2. Top traits from global databases for all Brassicaceae available (a) and the top eight genera and species with the most traits characterised (b).This information was obtained from a total of 599 taxa.

Figure 4 .
Figure 4. Global distribution of the new 103 estimated cross-compatible CWRs of Brassicaceae using phylogenetic distances.See Supplementary data for more details.

Figure4
Figure4Click here to access/download;Figure;Figure4.pdf Eutrema was split due to the geographical distribution of the species (all of them occur only in Central China) and were clustered with Orychophragmus taxa, which are mostly distributed in China.However, the phylogenetic tree reconstructed with matK resolved Eutrema species in a monophyletic clade, although aforementioned four species formed a subclade within the genus.This could reflect a different evolutionary history between nuclear and plastid markers (reflected by larger phylogenetic distances, Fig.1), and those four species are likely to be non-compatible with wasabi crop (Eutrema japonicum), but further research is needed for this unexplored crop.
The genus Eutrema was split in the phylogenetic tree ITS in two clades, one containing most Eutrema species and a second with E. violifolium (H.Lév.)Al-Shehbaz & Warwick, E. yungshunense (W.T.Wang) Al-Shehbaz & Warwick, E. xingshanense (Z.E.Chao, Z.L.Ning & X.W.Hu) G.Q.Hao, Al-Shehbaz & J.Quan Liu and E. grandiflorum (Al-Shehbaz) Al-Shehbaz & Warwick, grouped to a clade formed by the genus Orychophragmus.This is in agreement with other phylogenetic trees[59] where the genus [91]tionally other Brassicaceae genera (Armoracia, Barbarea, Camelina, Crambe, Diplotaxis, Eruca, Isatis, Lepidium, Raphanus, Rorippa, Sinapidendron and Sinapis) were also considered for the European Red List because of their importance to human and animal food.However, conservation assessments are urgently needed, since almost 80% of the wild Brassicaceae are Data Deficient according to the IUCN Red List[91], or not assessed for their global threatened status (Supplementary data), including 38 taxa that are new CWRs potentially cross-compatible with crops.